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Abstract — We describe a novel integrated algorithm for real- 
time enhancement of video acquired under challenging lighting 
conditions. Such conditions include low lighting, haze, and high 
dynamic range situations. The algorithm automatically detects the 
dominate source of impairment, then depending on whether it is low 
lighting, haze or others, a corresponding pre-processing is applied 
to the input video, followed by the core enhancement algorithm. 
Temporal and spatial redundancies in the video input are utilized to 
facilitate real-time processing and to improve temporal and spatial 
consistency of the output. The proposed algorithm can be used as 
an independent module, or be integrated in either a video encoder 
or a video decoder for further optimizations. 

I. Introduction 

As video surveillance equipments and mobile devices such 
as digital cameras, smart phones and netbooks are increasingly 
widely deployed, cameras are expected to acquire, record and 
sometimes compress and transmit video content in all lighting 
and weather conditions. The majority of cameras, however, are 
not specifically designed to be all-purpose and weather-proof, 
rendering the video footage unusable for critical applications 
under many circumstances. 

Image and video processing and enhancement including 
gamma correction, de-hazing, de-bluring and etc. are well- 
studied areas with many successful algorithms proposed over 
the years. Although different algorithms perform well for 
different lighting impairments, they often require tedious and 
sometimes manual input-dependent fine-tuning of algorithm 
parameters. In addition, different specific types of impairments 
often require different specific algorithms. 

Take the enhancement of videos acquired under low lighting 
conditions as an example. To mitigate the problem, far and 
near infrared based techniques (Q, (21 , O, IH) are used in 
many systems, and at the same time, various image processing 
based approaches have also been proposed. Although far and 
near infrared systems are useful for detecting objects such 
as pedestrians and animals in low lighting environments, 
especially in "professional" video surveillance systems, they 
suffer from the common disadvantage that detectable objects 
must have a temperature that is higher than their surroundings. 
In many cases where the critical object has a temperature 
similar to its surroundings, e.g. a big hole in the road, the 
infrared systems are not as helpful. Furthermore, infrared 
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systems are usually more expensive, harder to maintain, with 
a relatively shorter life-span than conventional systems. They 
also introduce extra, and often times considerable power 
consumption. In many consumer applications such as video 
capture and communications on smart phones, it is usually 
not feasible to deploy infrared systems due to such cost 
and power consumption issues. Conventional low lighting 
image and video processing enhancement algorithms such as 
|5 | and |6| often work by reducing noise in the input low 
lighting video followed by contrast enhancement techniques 
such as tone-mapping, histogram stretching and equalization, 
and gamma correction to recover visual information in low 
lighting images and videos. Although these algorithms can 
lead to very visually pleasing enhancement results, they are 
usually too complicated for practical real-time applications, 
especially on mobile devices. For example, the processing 
speed of the algorithm in | 5 1 was only 6 fps even with GPU 
acceleration. In |6|, recovering each single image required 
more than one minute. 

In this paper, we describe a novel integrated video enhance- 
ment algorithm applicable to a wide range of input impair- 
ments. It has low computational and memory complexities that 
are both within the realm of reasonable availability of many 
mobile devices. In our system, a low complexity automatic 
module first determines the pre-dominate source of impairment 
in the input video. The input is then pre-processed based on 
the particular source of impairment, followed by processing 
by the core enhancement module. Finally, post-processing is 
applied to produce the enhanced output. In addition, spatial 
and temporal correlations were utilized to improve the speed 
of the algorithm and visual quality of the output, enabling 
it to be embedded into video encoders or decoders to share 
temporal and spatial prediction modules in the video codec to 
further lower complexity. 

The paper is organized as the following. In Section |ll| 
we present the heuristic evidences that motivated the idea in 
this paper. In Section [IIl| we explain the core enhancement 
algorithm in detail, while in Section |IV] we describe various 
algorithms for reducing the computational and memory com- 
plexities. Sections |V] contains the experimental results. Given 
that in real- world applications, the video enhancement module 
could be deployed in multiple stages of the end to end pro- 
cedure, e.g. before compression and transmission/storage, or 
after compression and transmission/storage but before decom- 
pression, or after decompression and before the video content 
displayed on the monitor, we examine the complexity and RD 
tradeoff associated with applying the proposed algorithm in 
these different steps in the experiments. Finally we conclude 
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the paper and future works in Section IVI 



II. A Novel Integrated Algorithm for Video 
Enhancement 
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Fig. 1. Examples of original (Top), inverted low lighting videos/images 
(Middle) and haze videos/images (Bottom). 



The motivation for our algorithm is a key observation that 
if we perform a pixel- wise inversion of low lighting videos or 
high dynamic range videos, the results look quite similar to 
hazy videos. As an illustrative example, we randomly selected 
(by Google) and captured a total of 100 images and video 
clips in haze, low lighting and high dynamic range weather 
conditions respectively. Some examples are shown in Fig. [T] 
Here, the "inversion" operation is simply 



R%x) = 255-I%x), 



(1) 



where R^{x) and I^{x) are intensities for the corresponding 
color (RGB) channel c for pixel x in the input and inverted 
frame respectively. 

As can be clearly seen from Fig. [T] at least visually, the 
video in hazy weather are similar to the inverted output 
of videos captured in low lighting and high dynamic range 
conditions. This is intuitive because as illustrated in |7|, in 
all these cases, e.g. hazy videos and low lighting videos, light 
captured by the camera is blended with the airlight (ambient 
light reflected into the line of sight by atmospheric particles). 
The only difference is the actual brightness of the airlight, 
white in the case of haze videos, black in the case of low 
lighting and high dynamic range videos. 

The observation is confirmed by various haze detection 
algorithms. We implemented haze detection using the HVS 
threshold range based method |8 1, the Dark Object Subtraction 
(DOS) approach and the spatial frequency based technique 
£lO|, and found that hazy, inverted low lighting videos and 
inverted high dynamic range videos were all classified as hazy 
video clips, as opposed to "normal" clips. 

We also performed the chi-square test to examine the 
statistical similarities between hazy videos and inverted low 
lighting and high dynamic range videos. The chi-square test is 
a standard statistical tool widely used to determine if observed 
data are consistent with a specific hypothesis. As explained in 
irTn . in chi-square tests, a p value is calculated, and usually. 
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Fig. 2. The histogram of the minimum intensity of each pixel's three color 
channels of haze videos (Top), low lighting videos (Middle) and high dynamic 
range videos (Bottom). 



if p > 0.05, it is reasonable to assume that the deviation of 
the observed data from the expectation is due to chance alone. 
In our experiments, the expected distribution was calculated 
from hazy videos and the observed statistics from inverted low 
lighting and high dynamic range videos were tested. In the 
experiments, we divided the range [0, 255] of color channel 
intensities into eight equal intervals, corresponding to a degree 
of freedom of 7. According to the chi-square distribution 
table, if we adopt the common standard of p > 0.05, the 
corresponding upper threshold for the chi-square value should 
be 14.07. The histogram of the minimum intensities of all color 
channels of all pixels for hazy videos, inverted low lighting 
and inverted high dynamic range videos were used in the tests, 
some examples are shown in Fig. [2] The results of the chi- 
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TABLE I 
Results of chi square tests 



Data of chi square test 


Degrees of Freedom 


Chi square values 


Haze videos and inverted low lighting videos 


7 


13.21 


Haze videos and inverted high dynamic range videos 


7 


11.53 




Fig. 3. Examples of hazy videos/images (Left) and their dark channel images 
(Right). 




Fig. 4. Examples of low lighting videos/images (Left) and their dark channel 
images (Right). 



square tests are given in Table |T| As can be seen from the table, 
the chi-square values are far smaller than 14.07, demonstrating 
that our hypothesis of the similarities between haze videos and 
inverted low lighting videos, and between haze videos and high 
dynamic range videos is reasonable. 

Through the experiments, we also found that the pixels 
whose minimum intensity of the three color channels was 
low had a very high probability of locating in regions of 
houses, vehicles and etc.. We introduce the concept of Region 
of Interests (ROIs) for these regions. To visually demonstrate 
the ROIs, we calculated the image of minimum intensities of 
color channels for hazy videos, inverted low lighting videos 
and inverted high dynamic range videos. Three examples are 
shown in Fig. |3] Fig. |4] and Fig. |5] 

In conclusion, through visual observation and statistical 
tests, we found that video captured in a number of challenging 
lighting conditions is statistically and visually similar to hazy 
videos. Therefore, it is conceivable that a generic core module 
could be used for the enhancement of all these cases. 

III. A Generic Video Enhancement Algorithm 
Based on Image De-Hazing 

A. Core De-Hazing Based Enhancement Module 

Since after proper pre-processing, videos captured in chal- 
lenging lighting conditions, e.g. low lighting and high dynamic 
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Fig. 5. Examples of high dynamic range videos/images (Left) and their dark 
channel images (Right). 



range, exhibit strong similarities with hazy videos in both the 
visual and statistical domains, the core enhancement algorithm 
in our proposed system is an improved de-hazing algorithm 
based on |[T2ll . 

As mentioned above, most of the existing advanced haze- 
removal algorithms ( ifTSl . |[T2ll . |[T4ll . and ifTSll ) are based on 
the well-known degradation model proposed by Koschmieder 
in 1924 |7 |: 



R{x) = J{x)t{x) + A{1 - t{x)), 



(2) 



where A is the global airlight, R{x) is the intensity of pixel 
X that the camera captures, J{x) is the intensity of the 
original objects or scene, and t{x) is the medium transmission 
function describing the percentage of the light emitted from the 
objects or scene that reaches the camera. This model assumes 
that each degraded pixel is a combination of the airlight 
and the unknown surface radiance. The medium transmission 
describes what percentage of the light emitted from the objects 
or scene can reach the camera. And it is determined by the 
scene depth and the scattering coefficient of the atmosphere. 
For the same video where the scattering coefficient of the 
atmosphere is constant, the light is more heavily affected by 
the airlight in sky regions because of the longer distance. In 
other regions such as vehicles, houses and etc., especially those 
nearby, the light is less affected by the airlight. 

The critical part of all the algorithms based on the 
Koschmieder model is to estimate A and t{x) from the recoded 
image intensity I{x) so as to recover the J{x) from I{x). For 
example, in 1 13], Independent Component Analysis is used to 
estimate the medium transmission and the airlight. In 1 12], the 
medium transmission and airlight are estimated by the Dark 
Channel method, based on the assumption that the medium 
transmission in a local patch is constant. 

In our system, we estimate t{x) according to |12| using 



t{x) 



1 



uj mm < mm 

ce{r,g,b} [yen{x) 



A^ 



(3) 



where uj = 0.8 and Q{x) is a local 9x9 block centered 
at X in this paper. As our system also targets application 
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Fig. 6. Examples of processing steps of low lighting enhancement algorithm: 
input image / (Top left), inverted input image R (Top right), haze removal 
result J of the image R (Bottom left), and output image (Bottom right). 



in mobile devices, the cpu-and-memory-costly soft matting 
method proposed in |[T2ll is not implemented in our algorithm. 

To estimate airlight, we first note that the schemes in 
existing image haze removal algorithms are usually not robust 
and even very small changes to the airlight value might lead 
to very large changes to the recovered images or video frames. 
Therefore, calculating airlight frame-wise not only increases 
the overall complexity of the system, but also introduces vi- 
sual inconsistency between frames, thereby creating annoying 
visual artifacts. Fig. |7] shows an example using the results of 
the algorithm in |12|. Notice the difference between the first 
and fourth frame in the middle row. 

Based on this observation, we propose to calculate airlight 
only once for a Group of Pictures (GOP). This is done for 
the first frame of the GOP, then the same value is used for all 
subsequent frames in the same GOP. In the implementation, 
we also incorporated a scene change detection module so as 
to detect sudden changes in airlight that are not aligned with 
GOP boundaries but merit recalculation. 

In our system, to estimate airlight, we first select 100 pixels 
whose minimum intensities in all color (RGB) channels are the 
highest in the image. Then from these pixels, we choose the 
single pixel whose sum of RGB values is the highest. Among 
successive GOPs, we refresh the value of airlight using the 
equation 



A = ^*0.4 + At *0.6, 



(4) 



where At is the airlight value calculated in this GOP, A is the 
global airlight value. This can efficiently avoid severe changes 
of the global airlight value A, bringing about the excellent 
recovered results and saving a large amount of computation at 
the same time. Examples of the recovered results are shown 
in Fig. [T] The first and fourth frame in the bottom row change 
gradually using our algorithm. 



Then, from ([2]), we can find 

R{x) 



J{x) = 



A 



t{x) 



A. 



(5) 



Although ([5]) works reasonably well for haze removal, through 
experiments we found that direct application of equation 
^ might lead to under-enhancement for low lighting areas 
and over-enhancement for high lighting areas when applied 
to low lighting video enhancement. To further optimize the 
calculation of t(x), we focus on enhancing the ROIs while 
avoid processing the background, e.g. sky regions in low 
lighting and high dynamic range videos. This not only further 
reduces computational complexity, but also improves overall 
visual quality. To this end, we adjust t{x) adaptively while 
maintaining its spatial continuity, so that the resulted video 
becomes more smooth visually. We introduce a multiplier 
P{x) into equation ([s]), and through extensive experiments, 
we find that P{x) can be set as 

r 2t{x) 0<t{x)<0.b, 

-2t2(x)+8-^ 0.5<t{x)<l. 

Then ^ becomes 

R{x) - A 



(6) 



J{x) 



A. 



(7) 



P{x)t{x) 

The idea behind ^ is as the following. When t{x) is smaller 
than 0.5, which means that the corresponding pixel needs 
boosting, we assign P{x) a small value to make P{x)t{x) 
even smaller so as to increase the RGB intensities of this pixel. 
On the other hand, when t{x) is greater than 0.5, we refrain 
from overly boosting the corresponding pixel intensity. When 
t{x) is close to 1, P{x)t{x) may be larger than 1, resulting in 
slight "dulling" of the pixel, so as to make the overall visual 
quality more balanced and pleasant. 

For low lighting and high dynamic range videos, once J{x) 
is recovered, the inversion operation ([T]) is performed again 
to produce the enhanced videos of the original input. This 
process is conceptually shown in Fig. [6] The improvement 
after introducing P{x) can be seen in Fig. [9] 

B. Automatic Impairment Source Detection 

As mentioned above, we use the generic video enhancement 
algorithm of the previous subsection for enhancing video 
acquired in a number of challenging lighting conditions. In 
addition to this core enhancement module, the overall system 
also contains a module for automatically detecting the main 
source of visual quality degradation to determine if the pre- 
processing by pixel-wise inversion is required. In the case 
when pixel- wise inversion is required, different pixel wise fine 
tuning may also be introduced so that the eventual output after 
enhancement is further optimized. The flow diagram for this 
automatic detection system is shown in Fig. [5] 

Our detection algorithm is based on the technique intro- 
duced by R. Lim et al. HI. To reduce complexity, we only 
perform the automatic detection for the first frame in a GOP, 
coupled with a scene change detection. The corresponding 
algorithm parameters are shown in Table |ll| The test is 
conducted for each pixel in the frame. If the percentage of hazy 



Fig. 7. The comparison of original, haze removal, and optimized haze removal video clips. Top: input video sequences. Middle: outputs of image haze 
removal algorithm of 1 12|. Bottom: outputs of haze removal using our optimized algorithm in calculating airlight. 




TABLE II 

Specific parameters of the haze detection algorithm. 





Color attribute 


Threshold range 
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~ 255 


~ 130 
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~ 255 


90 240 
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Fig. 8. Flow diagram of the module of determining dominate source of 
impairment. 



pixels in a picture is higher than 60%, we consider the picture 
as a hazy picture. Similarly, if an image is determined to be a 
hazy picture after inversion, it is labeled as a low lighting 
or high dynamic range image, both of which require the 
introduction of the multiplier P{x) into the core enhancement 
algorithm. 



Fig. 9. Examples of optimizing low lighting and high dynamic range 
enhancement algorithm by introducing P(x): Input (Left), output of the 
enhancement algorithm without introducing P(x) (Middle), and output of 
the enhancement algorithm by introducing P(x) (Right). 



IV. ACCELERATION OF PROPOSED VIDEO 
ENHANCEMENT PROCESSING ALGORITHM 



The algorithm described in Section III is a frame based 
approach. Through experimental results, we found that the 
calculation of t{x) occupies about 60% of the total com- 
putation time. For real-time and low complexity processing 
of video inputs, it is not desirable to apply the algorithm 
of Section |lll| on a frame by frame basis, which not only 
has high computational complexity, but also makes the output 
results much more sensitive to temporal and spatial noise, and 
destroys the temporal and spatial consistency of the processed 
outputs, thereby lower the overall perceptual quality. 

To solve these problems, we notice that the t{x) and 
other model parameters are correlated temporally and spa- 
tially. Therefore, we propose to accelerate the algorithm by 
introducing motion estimation. 

Motion estimation/compensation (ME/MC) is a key proce- 
dure of the state-of-the-art video compression standards. By 
matching blocks in subsequently encoded frames to find the 
"best" match of a block to be encoded and a block of the 
same size that has already been encoded and then decoded 
(referred to as the "reference"), video compression algorithms 
use the reference as a prediction of the block to be encoded and 
encodes only the difference (termed the "residual") between 
the reference and the block to be encoded, thereby reducing 
the rate that is required to encode the current block to a 
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fidelity level. The process of finding the best match between 
a block to be encoded and a block in a reference frame is 
called "motion estimation", and the "best" match is usually 
determined by jointly considering the rate and distortion costs 
of the match. If a "best" match block is found, the current 
block will be encoded in inter mode and only the residual will 
be encoded. Otherwise, the current block will be encoded in 
intra mode. The most commonly used metric for distortion in 
motion estimation is the Sum of Absolute Differences (SAD). 




Fig. 11. Subsampling pattern of proposed fast SAD algorithm. 




0.3 0.4 0.5 0.6 0.7 0.8 

Relative difference of t(x) 

Fig. 10. Differences of t(x) values between the predicted block's pixels' 
and its reference block's pixels'. 

To verify the feasibility of using temporal block matching 
and ME to expedite t{x) calculation, we calculated the differ- 
ences of t{x) values for pixels in the predicted and reference 
blocks. The statistics in Fig. [T0| shows that the differences are 
less than 10% in almost all cases. Therefore, we could utilize 
ME/MC to accelerate the computationally intensive calculation 
of t{x) and only needed to calculate t{x) of a few selective 
frames. For the non-critical frames, we used the corresponding 
t{x) values of the reference pixels. To reduce the complexity 
of the motion estimation process, we used mature fast motion 
estimation algorithms e.g. Enhanced Prediction Zonal Search 
(EPZS) lHH. When calculating the SAD, similar to (HI and 
ifTSl , we only utilized a subset of the pixels in the current 



and reference blocks using the pattern shown in Fig. [TT] With 
this pattern, our calculation "touched" a total of 60 pixels in 
a 16 X 16 block, or roughly 25%. These pixels were located 
on either the diagonal or the edges, resulting in about 75% 
reduction in SAD calculation when implemented in software 
on a general purpose processor. 

In our implementation, when the proposed algorithm is 
deployed prior to video compression or after video decompres- 
sion, we first divide the input frames into GOPs. The GOPs 
could either contain a fix number of frames, or decided based 
on a max GOP size (in frames) and scene changing. Each 
GOP starts with an Intra coded frame (I frame), for which all 
t{x) values are calculated. ME is performed for the remaining 
frames (P frames) of the GOP, similar to conventional video 



encoding. To this end, each P frame is divided into non- 
overlapping 16 X 16 blocks, for which a motion search using 
the SAD is conducted. A threshold T is defined for the SAD 
of blocks: if the SAD is below the threshold which means 
a "best" match block is found, the calculation of t{x) for 
the entire MB is skipped. Otherwise, t{x) still needs to be 
calculated. In both cases, the values for the current frame are 
stored for possible use for the next frame. The flow diagram 



is shown in Fig. 12 We call this acceleration algorithm as ME 
acceleration enhancement algorithm. 
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Fig. 12. Flow diagram of the core enhancement algorithm with ME 
acceleration. 

In addition to operating as a stand-along module with 
uncompressed pixel information as both the input and output, 
the ME accelerated enhancement algorithm could also be 
integrated into a video encoder or a video decoder. When the 
algorithm is integrated with a video encoder, the encoder and 
the enhancement can share the ME module. When integrated 
with the decoder, the system has the potential of using the 
motion information contained in the input video bitstream 
directly, and thereby by -passing the entire ME process. Such 
integration will usually lead to a RD loss. The reason for 
this loss is first and foremost that the ME module in the 
encoder with which the enhancement module is integrated or 



the encoder with which the bitstreams that a decoder with 
enhancement decodes may not be optimized for finding the 
best matches in t{x) values. For example, when the enhance- 
ment module is integrated with an decoder, it may have 
to decode an input bitstream encoded by a low complexity 
encoder using a really small ME range. The traditional SAD 
or SAD-plus-rate metrics for ME are also not optimal for t{x) 
match search. However, through extensive experiments with 
widely used encoders and decoders, we found that such quality 
loss were usually small, and well-justified by the savings in 
computational cost. The flow diagrams of integrating the ME 
acceleration enhancement algorithm into encoder and decoder 



are shown in Fig. 15 and Fig. 16 Some of the comparisons 
can be found in Section |Vl 

V. Experimental Results 

To evaluate the proposed algorithm, a series of experiments 
were conducted with a Windows PC (Intel Core 2 Duo 
processor running at 2.0 GHz with 3G of RAM) and an iPhone 
4. The resolution of testing videos in our experiments was 
640 X 480. 





Fig. 13. Examples of low lighting video enhancement algorithm: Original 
input (Top), and the enhancement result (Bottom). 

Examples of the enhancement outputs for low lighting, high 



dynamic range and hazy videos are shown in Fig. 13 Fig. 14 



and Fig. 17 respectively. As we can see from these figures, 
the improvements in visibility are obvious. In Fig. [TSj the 
yellow light from the windows and signs such as "Hobby 
Town" and other Chinese characters were recovered in correct 



color. In Fig. 14 the headlight of the car in the original input 
made letters on the license plate very difficult to read. After 
enhancement with our algorithm, the license plate became 
much more intelligible. The algorithm also worked well for 



Fig. 14. Examples of high dynamic range video enhancement algorithm: 
Original input (Top), and the enhancement result (Bottom). 



video captured in hazy, rainy and snowy weathers as shown 
in Fig. [17| Fig. [18] and Fig. [19] 

In addition, the proposed ME-based acceleration greatly 
reduces the complexity of the algorithm with little information 
lost. As mentioned above, there are three possible ways of 
incorporating ME into the enhancement algorithm, i.e. through 
a separate ME module in the enhancement system, as well as 
utilizing the ME module and information available in a video 
encoder or decoder. Some example outputs of the frame-wise 
enhancement algorithm and these three ways of incorporating 



ME are shown in Fig. 23 with virtually no visual difference. 
We also calculated the average RD curves of ten randomly 
selected experimental videos using the three acceleration 
methods. The reference was enhancement using the proposed 
frame-wise enhancement algorithm in YUV region. The RD 
curves of performing the frame-wise enhancement algorithm 



before encoding or after decoding are shown in Fig. 20 while 
the results for acceleration using a separate ME module are 



given in Fig. 21 and integrating the ME acceleration into 



the codec are shown in Fig. 22 As the RD curves in our 
experiments reflect the aggregated outcome of both coding and 
enhancement, and because enhancement was not optimized 
for PSNR based distortion, the shape of our RD curve looks 
different from RD curves for video compression systems, even 
though distortion as measured in PSNR is still a monotonic 
function of the rate. First, from the three figures, we find that in 
general, performing enhancement before encoding has better 
overall RD performance. Although enhancing after decoding 
means we can transmit un-enhanced video clips, which usually 
having lower contrast, less detail and are easier to compress, 
the reconstructed quality after decoder/enhancement is heavily 
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Fig. 15. Flow diagram of the integration of encoder and ME acceleration enhancement algorithm. 
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Fig. 16. Flow diagram of the integration of decoder and ME acceleration enhancement algorithm. 



affected by the loss of quality during the encoding, leading 
to an overall RD performance loss of 2 dB for the cases 
in the experiments. In addition, in Fig. [20| the RD loss of 
frame- wise enhancement was due to encoding and decoding. 
In Fig. 21 the RD loss resulted from ME acceleration and 
encoding/decoding. In Fig. 21 the RD loss resulted from 



integration of ME acceleration algorithm into encoder and 
decoder. Overall however, the RD loss introduced by ME 
acceleration and integration was small in PSNR terms, and 
not visible subjectively. 

We also measured the computational complexity of frame- 
wise enhancement, acceleration with a separate ME module 
and integration into an encoder or a decoder. The computa- 
tional cost was measured in terms of average time spent on 
enhancement per frame. For the cases when the enhancement 
was integrated into the codec, we did not count the actual 
encoding or decoding time, so as to measure only the en- 
hancement itself. As shown in the Table |lll| using a separate 
ME module saved about 27.5% time on average compared 
with the frame-wise algorithm. On the other hand, integrating 
with the decoder saved 40% time compared with the frame 
wise algorithm, while integrating with the encoder saved about 
77.3%. 



VI. Conclusions 



In the paper, we propose a novel fast and efficient integrated 
algorithm for real-time enhancement of videos acquired under 
challenging lighting conditions including low lighting, bad 
weather (hazy, rainy, snowy) and high dynamic range con- 
ditions. We show that visually and statistically, hazy video 
and video captured in various challenging lighting conditions 
are very similar, and therefore a single core enhancement 
algorithm can be utilized in all cases, along with a proper 
pre-processing and an automatic impairment source detection 
module. We also describe a number of ways of reducing the 
computational complexity of the system while maintaining 
good visual quality, and the tradeoffs involved when the pro- 
posed system is integrated into different modules of the video 
acquisition, coding, transmission and consumption chain. 

Areas of further improvements include better pre-processing 
filters targeting specific sources of impairments, improved core 
enhancement algorithm, and better acceleration techniques. 
Also of great importance is a system that can process inputs 
with compounded impairments (e.g. video of foggy nights, 
with both haze and low lighting). 
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TABLE III 

Processing speeds of proposed algorithms over PC and iPhone4 





PC/ms per frame 


iPhone4/ms per frame 


Time saved 


Frame-wise enhancement algorithm 


40.1 


500.3 


N/A 


Separate ME acceleration enhancement algorithm 


29.3 


369 


27.5% 


Integration of ME acceleration enhancement algorithm into encoder 


9.2 


107.9 


77.3% 


Integration of ME acceleration enhancement algorithm into decoder 


24.8 


302.4 


40.0% 





Fig. 17. Examples of haze removal algorithm: Original input (Top), and the 
enhancement result (Bottom). 



Fig. 18. Examples of rainy video enhancement using haze removal algorithm: 
Original input (Top), and the enhancement result (Bottom). 
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algorithm: Original input (Top), and the enhancement result (Bottom). 
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Fig. 20. RD performance of frame-wise enhancement in encoder and decoder. 
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Fig. 21. RD performance of separate ME acceleration enhancement in 
encoder and decoder. 
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Fig. 22. RD performance of integration of ME acceleration enhancement 
into encoder and decoder. 
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Fig. 23. Examples of comparisons among the frame-wise algorithm and the 
three proposed ME acceleration methods: Original input (Top left), output 
of frame-wise algorithm (Top middle), output of separate ME acceleration 
algorithm (Top right), output of integration of ME acceleration algorithm into 
encoder (Bottom left) and decoder (Bottom right). 



